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ABSTRACT 

A classification algorithm incorpora- 
ting contextual information in a general, 
statistical manner is presented. Methods 
are investigated for obtaining adequate es- 
timates of the context distribution (a sta- 
tistical characterization of context) upon 
which the classification algorithm depends. 
Finally, a method of estimating optimal al- 
gorithm parameters prior to performing pre- 
liminary classifications is explored. 


I . INTRODUCTION 

The most widely used method for clas- 
sifying remotely sensed data from such 
sources as multispectral scanners on air- 
craft or satellite platforms is a point-by- 
point classification technique in which da- 
ta from each pixel in the scone are classi- 
fied individually by a maximum likelihood 
classifier [l]. The information normally 
used by this classifier is only spectral 
or, in some cases, spectral and temporal. 
There generally is no provision for using 
contextual information. 

In contrast, when scanner data are 
displayed in image form, a human analyst 
routinely uses context to help decide what 
is in the imagery. Using context, he may 
be able to easily pick out roads, delineate 
boundaries of agricultural fields, and dif- 
ferentiate between grass in an urban set- 
ting (lawns) and grass in an agricultural 
setting (pasture or forage crops) where a 
maximum likelihood point classifier would 
have much difficulty in doing so. 

Recently we have developed a classifi- 
cation algorithm which incorporates contex- 
tual information in a general, statistical 

This research was funded in part by 
National Aeronautics and Space Administra- 
tion Contract No. NAS9-15466 and National 
Science Foundation Grant MCS78-04366. 


manner [23. This algorithm exploits the 
tendency alluded to above of certain grou- 
nd-cover classes to bo more likely to oc- 
cur in some contexts than in others. 

An estimate of the "context distribu- 
tion" (a statistical characterization of 
the context in the scene to be classified) 
must be made before this classification al- 
gorithm can be used. Methods arc investi- 
gated here for obtaining sufficiently ac- 
curate estimates of the context distribu- 
tion. The process of estimating the con- 
text distribution can involve a large num- 
ber of preliminary classifications using 
the statistical context classifier. With 
the goal of limiting the number of prelimi- 
nary classifications needed, a method of 
predicting the optimal algorithm parameters 
without performing classifications is ex- 
plored. 


II. THE CLASSIFICATION MODEL 

Remote sensing imaging systems gene- 
rally provide data in the form of a two- 
dimensional array of N=NjXN 2 pixels of 

fixed but unknown classification. Let the 
observation at image coordinates (i,j) be 
and the true but unknown classification 

at that image point be 0^ c . . . ,w m ) 

where m is the number of cover classes re- 
presented in the scene, and w. is the 
cover class. Associated with each Xj^ and 

0^.. is a class-conditional density p{X^| 
O^j). The maximum likelihood point classi- 
fier estimates each 0^ in the following 
way: Decide 0..=w. if and only if g. (X. .) 

> Xj K K Xj 

-- for a H f=l, 2, . .. ,m where g^X^) 

is the discriminant function 

9 k (X ij > “ P(X i j|u J{ )p(u k ) (1) 

and P( u k ) is the prior probability of class 
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occurring in the scene. Usually a good 
estimate lor p(w k ) is not known (or oven 
sought) , and the approximation p(u» k ) ■ 1/m 
is used (uniform priors). 

Contextual information can be incor- 
porated into a decision rule of the same 
general type by modifying the discriminant 
function. Let the context at image point 
X. . consist of observations spatially near, 

X j 

but not necessarily adjacent to, X^j. Gr- 
oup these observations along with X^j into 

a vector of observations X^ »(X^,X 2 ,, , • , 

X ) T with X “X, . and the number of observa- 
P p xj 

tions token as context being p-1 (the or- 
dering is fixed but arbitrary). Call the 
arrangement of pixels in X^j the p-contcxt 

array. Lot the possible classes associated 
with X. j be £ p * *°1'°2' ’ ’ "°p^ T where 

e (jjjyWg* • • • ,%) and the ordering of the 
elements in G p coincides with that in X^j. 

Assuming that the observations are class- 
conditionally independent gives a discrimi- 
nant function incorporating context as 



where 0 is fixed as w. [2], The context 
P p 

distribution, G(£ ), is the relative fre- 
quency of occurrence in the scene of the 
class configuration in the n-context array 

given by 0 P . The similarity of this dis- 
criminant~function to the function used by 
the maximum likelihood point classifier be- 
comes clearer by rewriting g k (Xij) as 


9k<*ij> 


P (X 
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|to k > 




P n lp( x |o )V,(o p ) 

n=l / 


where 0 is again fixed as w . The summa- 
P 

tion term carries the contextual informa- 
tion and can be thought of as an expanded 
context-carrying version of p(o k > from the 


point classifier case. This discriminant 
function is identical to the no-context 
discriminant function when p=l since 


G ( 0 1 > 


p (w t ) , 


III. INTIMATING CONTEXT DISTRIBUTION— G(0 P ) 

To evaluate 9 k (£j^) we must know va- 
lues for the p(X n |O n ) and G(0 P ). Methods 
for estimating P(^ n |0 n ) arc well establish- 
ed from considerable experience in using 
the no-context maximum likelihood decision 
rule (as in Eq. 1) for classification (sea 
[ 1 3 ). Optimal methods for estimating G(0P) 
are not yet established. Preliminary work 
on finding practical methods for estimating 
G(£ p ) ia presented in [2], 

The most successful method developed 
to date for estimating G(oP) goes as fol- 
lows : 

1. Perform a no-context uniform- 
priors classification on the training set, 
restricting the classifier's decision rule 
to choosing among spectral classes in the 
correct information class. 

2. Estimate the context distribution, 
G(0P)» from the resulting 100 percent ac- 
curate classification of the training set 
by counting the number of occurrences* of 
all possible class configurations given by 
6 P , 

This method was used on a 50-pixel 
square area from the northeast corner of 
the Large Area Crop Inventory Experiment 
(LACXE) Segment No. 1860 in Hodgman Coun- 
ty, Kansas. The class-conditional densi- 
ties were estimated for the 16 spectral 
classes from randomly located training 
fields scattered throughout the entire 117- 
by-194 pixel Landsat data frame. The co- 
ordinates of the training set fields were 
chosen by selecting pixel coordinates from 
a random number table and surrounding the 
selected pixel by the largest homogeneous 
rectangle (up to field size 20 by 20). 

The classifications wore tested for accura- 
cy over five information classes (pasture, 
idle, wheat, corn and alfalfa) from "wall- 
to-wall" pixcl-by-pixel ground truth. 

The restricted no-context classifica- 
tion was performed over the first 25 lines 
of the 50-pixcl-squnre area and the context 
distribution was estimated over those 25 
lines. The classification results were 
evaluated over the last 25 lines. The re- 
sults show (Table 1) that this method pro- 
duced an estimate of the context distribu- 
tion, G(0 P ), which in turn produced con- 


1 

The estimate of the context distribution, 
G ( ( ,p ) , does not need to be normalized so as 
to~be an actual probability estimate. The 
normalization factor does not affect the 
classification decisions based on the dis- 
criminant function in Eg. 2. 
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Tabic 1 


CLASSIFICATION CLASS RESULTS ON LACIE DATA 

Accuracy, % 



Lines 

25-50 

Classification 

Average- 
Overall by-Class 

Uniform-priors no-context 
— unrestricted 

78.0 

75.6 

* 

4 nearest neighbors 

85.5 

81.6 

8 nearest neighbors 

87.1 

81.9 


G (0 15 ) estimated from restricted uniform- 
priors no-context classification over 
lines 1-25. 


*" Classification performance can be tabu- 
lated in two ways. 0 ve ra 11 _acc nr ac v. is 
simply the overall nuinbor “of” correct 
classifications divided by the total 
number attempted. Ayer a g e-b y-cla n s ac- 
curacy is obtained By first computing 
the accuracy for each class and taking 
the arithmetic average of the class ac- 
curacies. The latter is significant 
when the classification results exhi- 
bit a tendency to discriminate in fa- 
vor of or against a subset of the 
classes . 


textual classifications with significant 
improvement in classification accuracy over 
the conventional uniform-priors no-context 
classification on thi3 data set. 


sification over the training set is found. 
(The final result should then be evaluated 
on a test set disjoint from the training 
set. ) 

Results from a straightforward imple- 
mentation of this iterative "bootstrap" me- 
thod were reported earlier in [2]. Esti- 
mates of the context distribution were made 
from counting the number of occurrences of 
all possible class configurations in the 
appropriate classification. While thie me- 
thod produced excellent results when simu- 
lated data were used, results using real 
Landsat data were disappointing. 

It in thought that the no-context uni- 
form-priors classifications of real Land- 
sat data simply did not produce an accurate 
enough classification for the "bootstrap" 
method to work. Th« classifiction of the 
simulated data was accurate enough because 
the class-conditional probabilities p<x|o n > 
wore modeled exactly, whereas the class- 
conditional probabilities were not modeled 
exactly on the real data classifications. 
This resulted in estimates of the context 
distribution, G(QP), in the real data cases 
that contained more spurious class configu- 
ration counts than in the simulated case, 
which in turn gave poorer context classifi- 
cation results in the real data case. 

There arc several ways in which the 
context distribution estimates from real 
data no-context classifications could be 
"cleaned up." One could employ a threshold 
procedure which deletes all class configu- 
rations with counts below a certain number. 
Another approach would be to divide each 
class configuration count by a fixed num- 
ber and take the integer part of the re- 
sult as the new count, deleting all class 
configurations with counts that become 
zero. 


While this method can produce good es- 
timates of the contoxt distribution, it 
suffers the limitation that a sufficient 
number of blocks of ground truth of suffi- 
cient size are needed to make an accurate 
estimate of the context distribution. This 
method cannot bo used at all when blocks of 
grornd truth data arc not available, while 
the conditional probabilities can bo esti- 
mated from ground truth at random pixel lo- 
cations. 

Another possible method of estimating 
the contoxt distribution would be to base 
the estimate on a uniform-priors no-context 
classification. Such an estimate might 
then be refined by basing a new estimate on 
the context classification made using the 
first context distribution estimate. The 
estimates might even be iterated until the 
estimate producing the most accurate clas- 


Yot another method for reducing the 
effect of spurious class configuration cou- 
nts is to raise each count to a power and 
use the result as the context distribution 
estimate. For powers greater than one, the 
class configurations with larger counts are 
favored even more heavily versus those with 
relatively small counts in the discrimi- 
nant function in Eq 4 2. Conversely, for 
powers less than one, the class configura- 
tions with large counts are less heavily 
favored. Going to the extreme of a power 
of zero results in all class configurations 
being equally favored as in a uniform- 
priors no-context configuration. 

This power method was first tried on a 
simulated data set to investigate the me- 
thod's characteristics undisturbed by un- 
known effects from inaccurate modeling in 
the real data sets. This simulated data 
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set [2] was generated from a very accurate 
no-context classification of Landsat-1 data 
from an urban area (Grand Rapids, Michigan). 
A 50-pixel-square segment was used in the 
tests. See Figure 1 for a summary of the 
results. The results seem to indicate 
that when the model is exact, as the power 
used is increased (to a certain point), the 
classification results tend towards the re- 
sults obtained when the context distribu- 
tion is estimated from ground truth. Also, 
as expected, as the power used is decreased 
below one, the results tend toward a uni- 
form-priors no-context classification. 

The power method was also used on a 
50-pixel-square segment of Landsat data 
containing approximately equal amounts of 
urban and agricultural area located to the 
southeast of Bloomington, Indiana. Statis- 
tics for the spectral classes were estima- 
ted using the 100-pixel-square area center- 
ed on the 50-pixel-square segment. A very 
careful uniform-priors no-context classifi- 
cation using 14 spectral classes was per- 
formed to delineate agricultural, urban and 
forested areas. As there were too few fo- 
rested pixels to delineate forest test a- 
reas reliably, the classification was test- 
ed only for accuracy in classifying the ag- 
ricultural and urban classes. Out of the 
2500 pixels in the segment, a total of 867 
pixels were manually interpreted as agri- 
culture and 450 pixels as urban. The iden- 
tification was made by interpretation of 
color infrared photography taken by air- 
craft on the same day as the Landsat pass. 


As mentioned earlier, a straightfor- 
ward implementation of the iterative boot- 
strap method of estimating the context dis- 
tribution for this data set produced disap- 
pointing results. Whereas the no-context 
uniform-priors classification had an over- 
all accuracy of 83.1 percent and average- 
by-class accuracy of 82.7 percent, the 
best the bootstrap method could do in three 
iterations was 85.3 percent overall accura- 
cy and 84.8 percent average-by-class accu- 
racy. The fourth iteration produced no 
improvement. 

Figure 2 summarises the resul ts using 
the power method on two-noarest-ncighbors 
context (neighbors to the north and east) 
based on an estimate of G(0P) from the no- 
context uniform-priors classification. 
Trading off overall accuracy against aver- 
ago-by-class accuracy, the best classifica- 
tion was produced using a power of 5, for 
which an overall accuracy of 87.0 percent 
and avcraqc-by-class accuracy of 86.1 per- 
cent was achieved. This nearly doubled 
the accuracy improvement over the no-con- 
text classification produced by the strai- 
ght bootstrap method. Note also that the 


results in Figure 2 follow the general tro- 
nd of the simulated data results in Figure 

1 i 

A second iteration of estimating the 
context distribution, G(£P), was then made 
based on the classifications listed in Fi- 
gure 2. The second estimate of G(uP) based 
on the classification using the first esti- 
mate raised to a power of 10 produced the 
best classification results with an overall 
accuracy of 88.5 percent and an average-bv- 
class accuracy of 87.5 percent (using G(0**) 
raised to a power of 5), See Table 2 an3 
Figure 3 for a summary of results. This 
second estimate of G(C'P) gave a total 5.4 
percent improvement in overall accuracy and 
4.8 percent improvement in avcragc-by-clasa 
accuracy over the no-eontext classification. 
Even though these improvements are not as 
large as in the results using simulated da- 
ta, or using the more restrictive method on 
real data, these results are certainly en- 
couraging. 


Tabic 2 

SECOND ITERATION POWER METHOD RESULTS 

Best four nearest-neighbor classifications 
with G(£^) based on the classification in 
Figure 2. 


Power Used Accuracy, » 

Power Used in This Average- 

in Fig. 2 Classification Overall by-Class 


2 

5 

86.5 

85.6 

3 

5 

86.3 

85.7 

5 

5 

87.3 

86.7 

7 

5 

88.1 

87.2 

10 

5 

88.5 

87.5 

15 

3 

87.7 

87.2 


Prior to making the second iteration 
estimate of G(rP) above, it was assumed 
that the more accurate aclassi fication was, 
the more accurate the estimate of G(C>P) 
from it would bo. Tnc* results quotccl here 
show clearly that this is not always the 
case, further study is required before it 
can be determined whether this type of be- 
havior is typical, and before this behavior 
can bo exploited optimally. 
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FIGURE 1. Power method results using as 
context ono-noarest-neighbyr (south) on 
the simulated data set. Context distribu- 
tion, GfM'), estimated from uniform-priors 
no-context, except where noted otherwise. 
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FIGURE; 2. Power method results using two- 
nonroot-neitihbors (north and cast) context 
on Bloomington, IN data sot. Context dis- 
tribution, G(£P), estimated from uniform- 
priors no-context distribution. 
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FIGURE 3. Power method results using four- 
noarest-neighbors context on Bloomington, 

IN data sot. Context distribution, G(oP), 
estimated from two-ncarest noiglibor (north 
and east) context classification with con- 
text distribution raised to power 10. 


IV. PRACTICAL CONSIDERATIONS 

The general approach to estimating tho 
context distribution, as suggested by the 
results reported in tho previous section, 
can involve a large number of context clas- 
sifications before tho boat estimate is 
found. In addition to determining the best 
power of the context distribution to use at 
each iteration, the best p-contcxt array 
(how many and which neighbor (s) to use) 
needs to bo determined at each iteration. 

The site and shape of the p-context 
array directly affect computation cost and 
class! fiction accuracy. Generally, the 
larger tho p-context array, tho higher the 
computation cost. When the classification 
from which tho context configuration io 
estimated is sufficiently accurate, larger 
p-context arrays yield higher classifica- 
tion accuracies. Les3 accurate template 
classifications can result in cases whore 
a large p-context array will produce a 
classification that is less accurate than 
the no-context classification. Also, p- 
contoxt arrays of given size may produce 
differim. classification accuracies, de- 
pending on the shapes of the arrays. It 
would bo desirable to be able to predict 
the optimal size and shape of the p-con- 
text array and the best power of the con- 
text distribution to use at each iteration 
before any actual classifications are 
performed. 

V. ESTIMATION OP OPTIMAL P-CONTEXT 
ARRAY AND POWER 

A theoretical measure of context has 
been developed from tho perspective of ap- 
plying this measure to predicting the op- 
timal p-context array. This same measure 
may also be useful in estimating tho best 
power to use of the context distribution. 

Suppose that the relative frequency 
function G(£P) is such that it can be writ- 
ten in factored form, i.e,, 

G(0 P ) * G(0 q ) G(O^) (3) 


1 

2 

3 

4 

5 

6 

7 

8 

9 


Fig. 4. Pixel locations used in testing AG P 


where 0 q and oP~ q are, respectively, q and 
p-q vectors of classes. The last element 

of 0 p “ q is the same as tho last element of 
_ “2 

0y . If this factorization can indeed bs 
realized, Eq. 2 can be rewritten as 



where i p «k and the last element of 0^" q is 
Since the term in tho firBt set of 

brackets is independent of k, it is just a 
constant term that can be ignored when 
classifying point (i,j). When such a fac- 
torization as in Eq. 3 can be made, wo can 
reduce the size of the p-context array, re- 
ducing computation cost with no loss in 
classification accuracy. 

If G(0 P ) can be factored as in Eq. 3, 
it is clear that the distribution G(j)P) is 

one of independence for o q and o£" q ~This 

—1 — i 

suggests that a measure of nonrodundant 
contextual information from the pixel posi- 
tions in 0 q as compared to that from the 
pixel positions in e^" q would be a measure 
of departure from independence for o q and 
^2 ^ in the distribution G(jjP) . A possible 
measure of this departure would be 


m m 

AG P >= E ... £ /G(0 q ) G(0 P ” q ) - G(0 P )\ (5) 

q J^-l * p =l\ “1 "2 ~ ) 


where G(^ q ) and G(jd| q ) are now the margi- 
nals of G(o p ) . Other distributions of in- 
dependence with marginal G(0|~ q ) and other 
measures of departure from G(f)P) could be 
used. This particular form for AG P is at- 
tractive because it is particularly easy 
to calculate. 

The "context measure" AG P can be used 

CJ 

to estimate the optimal p-context array in 
tho following ways Establish 0^“ q as a 

fixed core (p-q) -context array. Calculate 



the values of AG P for various q-contoxt 
arrays as 0 q , distinct from the core array. 
The best p-context array for 0 P would be 

combined with the 0^ that produced the 
largest value for AG P . This, of course, 
assunea that the contextual information 
contributed by 0^ is not so erroneous that 

it would actually decrease classification 
accuracy. This may not be a reasonable 
assumption in all cases. 

The first tost o£AG p was made on the 

simulated data with p*2 and q»l and the 
context distributions estimated from tho 

ground truth. The context arrays 0* and 0j 
were defined with respect to the pixel lo- 
cations defined in Figure 4. 0* was first 

‘“2 i 

fixed as pixel position 5 and 3 was va- 

“1 


Table 

AG P TESTED ON SIMULATED 

q 

DISTRIBUTIONS ESTIMATED 

3 

DATA WITH CONTEXT 
FROM GROUND' TRUTH 

e 1 

"1 

Pixel 

Location 

el 

“2 

Pixel 

Location 

AG^xlO 

Accuracy, % 

4 

Average- 
Overall by-class 

8 

5 

5.09 

92.7 

74.0 

2 

5 

4.99 

91.6 

73.5 

4 

5 

4.90 

91.7 

71.8 

6 

5 

4.90 

91.7 

73,9 

7 

5 

3.42 

90.8 

71.2 

3 

5 

3.31 

90.4 

69.8 

9 

5 

3.26 

90.6 

70.6 

1 

5 

3.19 

90.6 

70.1 

7 

1 

2.58 

90.3 

68.6 

3 

1 

2.27 

90.2 

70.3 

8 

1 

1.98 

89.4 

67.9 

6 

1 

1.87 

90.4 

70.2 

9 

1 

1.53 

89.9 

69.5 




Table 

4 


AG P TESTED ON SIMULATED 

DATA WITH 

CONTEXT 

DISTRIBUTIONS ESTIMATED 1 

FROM UNIFORM-PRIORS 


NO-CONTEXT CLASSIFICATION 

Pixel 

Location 

s| 

Pixel 

Location 


Accuracy, % 

AG*xl0 

5 

Overall 

Average- 

by-Clase 

8 

5 

7.56 

79.8 

81.7 

2 

5 

7.30 

79.1 

81.9 

4 

5 

6.13 

78.8 

80.6 

6 

5 

6.11 

79.0 

81.4 

7 

5 

4.71 

78.8 

80.9 

3 

5 

4.53 

78.6 

80.6 

9 

5 

4.28 

78.4 

80.6 

1 

5 

4.22 

78.3 

79.7 

7 

1 

3.77 

78.5 

80.9 

8 

i 

2.73 

78.0 

80.0 

3 

1 

2.65 

78.0 

80.9 

6 

1 

2.31 

78.0 

80.8 

9 

1 

2.17 

78.0 

80.1 


ried over the remaining positions. Oj was 
also later fixed as pixel position 1 with 

0.J varied over the pixel positions relative 
to position 1 not covered previously (i.e. , 
positions 3, 6, 7, 8 and 9). 

As can bo seen in Table 3, AG P clearly 

predicted that the best neighbor to use for 
context would be any of the four nearest 
neighbors (pixel positions 2, 4, 6 or 8 re- 
lative to position 5). Ag p did not so 

q 

clearly predict which nearest neighbor was 
best. 

was again tested on the simulated 

data, but this time with the context dis- 
tributions estimated from the uniform- 
priors no-context classification. As shown 


- • * 

in Table 4# in this case nG p again tended 
to predict the best p context array. This 
time AC* predicted pixel position I to be 

the best neithboring pixel to use as con- 
text while pixel position 2 came in as a 
close second. These predictions held up 
quite well when compared to the classifica- 
tion accuracies. These distinctions among 
the reamining pixels, however, weren't pre- 
dicted as clearly. 

A test of AG P was also made using the 

Bloomington, Indiana Landsat data with the 
context distributions estimated from tho 
uniform-priors no-context classification 

(see Table 5) , Here Ac p did not predict 
the best p-contcxt array as well as in the 
simulated data case. AG P docs correlate 

positively with the accuracy results, but 
the correlation is fairly weak. It seems 
that the context hero in too erroneous for 
the predictor to function properly. 

It was then checked to see AG P cou- 
ld be used to predict the power of the con- 
text distribution to use for a particular 


Table 5 

AG P TESTED ON BLOOMINGTON, IND. LANDSAT DATA 

q 

SET. CONTEXT DISTRIBUTIONS ESTIMATED FROM 
UNIFORM-PRIORS NO-CONTEKT CLASSIFICATION 

Pixel 

Location 

4 

Pixel 

Location 

AgJxIO 5 

Accuracy, % 

Overall 

Avcrage- 

by-Class 

4 

5 

7.69 

84.2 

83.8 

6 

5 

7.68 

84.6 

84.1 

2 

5 

5.40 

85.2 

84.8 

8 

5 

5.31 

83.8 

83.4 

3 

5 

3.79 

84.2 

83.8 

7 

5 

3.61 

84.0 

83.5 

1 

5 

3.04 

84.4 

84.1 

9 

5 

2,96 

83.7 

83.2 


Table 6 

AC P EVALUATED AS A PREDICTOR Of 
BEST TEST DISTRIBUTION POWER ON 
BLOOMINGTON, INDIANA, DATA TEXT 
2 

6j * pixel location* 26 

0^ ■ pixel location 5 

Context distribution* estimated from 
uniform-prior* no-context distribution 



Accuracy, 6 


AG? 


Average- 

Powor 

4 

overall 

by-Class 

.5 

2.87xlo" 7 

84.4 

84.0 

.8 

8.23xl0" 7 

84.9 

84.4 

1.0 

2.05x10”** 

85.0 

84.5 

1.2 

4.81X10" 6 

85.0 

84.5 

1.4 

9.27X10” 6 

85.1 

84.5 

1.6 

1.37xl0 -5 

85.2 

84.5 

2.0 

1.34xl0“ 5 

85.4 

84.8 

3.0 

1.20xl0“ 6 

86.3 

85.9 

5.0 

4 . 04 X 10” 9 

87.0 

86.1 

7.0 

1 . 98 X 10” 11 

87.2 

85.0 

10.0 

underflow 

86.4 

82.5 


p-contoxt array, o}. was set as position 5 

2 

and 0^ was set as positions 2 and 6. The 

power used was varied as previously (see 

Figure 2). [MOTE: G(£ p f was normalized 

for each value of « so“as to remain a pro- 
bability estimate.] 

In Table 6, AG^ shows a distinct pat- 
tern of behavior as - the power of the con- 
text distribution is varied. As tho power 
is increased f rom one, AGf increases at 
first and then decreases: In this case, 

the power at which AG^ falls to approxima- 
tely its value in the ‘power of one case 
corresponds closely to the power that 
yields the highest classification accura- 
cies. As the power is increased further, 




1 


- 9 

Ac| decrease* sharply. Whim the power 

is increased to the value that produces the 
classification that in turn produces the 
best context distribution ostimate (in this 
case, a power of 10) . AGj is so small that 
it can't be calculated in the precision 
used. 

Further investigation with this and 
other data sets in needed to determine whe- 
ther this is a universal pattorn that can 
bo exploited in estimating the power of the 
context distribution that yiolds the best 
classification results. These results 
make it soem unlikely, however, that dGq 
could be used to predict the power which 
produces the best context distribution 
estimate, 

VI . CONCLUDING REMARKS 

The multispactral maximum likelihood 
classifier has boon extended to includo con- 
textual information from arbitrary points 
near, but not necessarily adjacent to, the 
point being classified. The successful ap- 
plication of this statistical context clas- 
sifier depends, however, upon the success- 
ful estimation of the a priori context dis- 
tribution, G(fl p ). A me£Koa~)Tas been devel- 
oped which can provide good estimates of 
the context distributions, assuming that 
blocks of representative ground truth are 
available. 

Attempts at developing a more general 
"bootstrap" method of estimating the con- 
text distribution have not yet been totally 
successful. Encouraging results have been 
obtained on the one data set tested by using 
the power method described in this paper. 

It is not clear, however, whether the p-con- 
text arrays and powers used in testing the 
power method were actually optimal. Other 
methods of producing cloaned-up context 
distribution estimates, such as the threshold 
method or division method, have yet to bo 
tested. Further, practical application of 
these bootstrap methods is clouded by the 
need to run several classifications to deter- 
mine the best p-context array and the power 
of the context distribution to use at each 
iteration. 

A theoretical basis for predicting the 
best p-context array has been developed. As 
with the power method itself, this predictor 


has only bosn touted on one data set. These 
preliminary results du nevertheless indicate 
certain trends warranting further study with 
other data sets. 

it seems that the Acjj predictor does 
not necessarily strongly correlate with clas- 
sification accuracy where the available con- 
textual information is somewhat inaccurate. 
This considers only the initial interation 
two-neighbors classification and not the sec- 
ond iteration four-neighbors or third itera- 
tion eight-neighbors classification results. 

A stronger correlation between beBt initial 
p-array as predicted by acB and the classi- 
fication results may appear in the eight- 
neighbor resultB. 

This same predictor was also tested 
with respect to determining (in some sense) 
the best power of the context distribution 
for the power method. Preliminary results 
indicate that the predictor may hold somo 
promise in finding the power of the context 
distribution which produces the best clas- 
sification results. It does not soem likely# 
howevor, that the predictor can be used for 
finding the power of the context distribu- 
tion which produces the best context dis- 
tribution estimate for the next iteration. 

It must be emphasized that the above 
results are provisional as they are based 
on a study of only one data set. They must 
be confirmed by studies involving other data 
sets. Quite possibly, no reliable estimation 
procedure simpler than actually performing a 
contextual classification can be found. If 
this is the case, the most effective way 
to "estimate" the best p-context array and 
context distribution power would be to per- 
form contextual classifications on represen- 
tative portions' of the scene before the total 
scene is classified. 
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